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Abstract 

H 

We present two approaches for linear prediction of long- memory time series. The first ap- 
proach consists in truncating the Wiener-Kolmogorov predictor by restricting the observations 
to the last k terms, which are the only available values in practice. We derive the asymptotic 
behaviour of the mean-squared error as k tends to +oo. By contrast, the second approach is 
non-parametric. An AR(/c) model is fitted to the long-memory time series and we study the 
error that arises in this misspecified model. 



■ Keywords: long-memory, linear model, autoregressive process, forecast error 

ARMA (autoregressive moving-average) processes are often called short-memory processes be- 
cause their covariances decay rapidly (i.e. exponentially). On the other hand, a long-memory process 
is characterised by the following feature: the autocovariance function a decays more slowly i.e. it 
is not absolutely summable. They are so-named because of the strong association between observa- 
^ \ tions widely separated in time. The long-memory time series models have attracted much attention 

lately and there is now a growing realisation that time series possessing long-memory characteristics 
arise in subject areas as diverse as Economics, Geophysics, Hydrology or telecom traffic (see, e.g., 
|Mandelbrot and Wallis, 1969| and |Granger and Joyeux, 1980] ). Although there exists substantial 
literature on the prediction of short-memory processes (see |Bhansali, 1978| for the univariate case 
or |Lewis and Reinsel, 1985| for the multivariate case), there are fewer results for long-memory time 
series. In this paper, we consider the question of the prediction of the latter. 

More precisely, we compare two prediction methods for long-memory process. Our goal is a linear 
predictor of based on observed time points which is optimal in the sense that it minimizes the 

mean-squared error. The paper is organized as follows. First we introduce our model and our main 
assumptions. Then, in section [2j we study the best linear predictor i.e. the Wiener-Kolmogorov 
predictor proposed by | Whittle, 1963 ] and by j Bhansali and Kokoszka, 2Q01| for long-memory time 
series. In practice, only the last k values of the process are available. Therefore we need to truncate 
the infinite series in the definition of the predictor and to derive the asymptotic behaviour of the 
mean-squared error as k tends to +oo. 
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In Section [3] we discuss the asymptotic properties of the forecast error if we fit a misspecified 
AR(fe) model to a long-memory time series. This approach has been proposed by |Ray, 1993] for 
fractional noise series F(d). His simulations show that high-order AR-models predict fractional 
integrated noise very well. 

Finally in Section |4] we compare the two previous methods for /i-step prediction. We give some 
asymptotic properties of the mean-squared error of the linear least-squares predictor as h tends to 
+00 in the particular case of long-memory processes. Then we study our k-th order predictors order 
as k tends to +00. 



1 Model 

Let (X n ) n& i be a discrete-time (weakly) stationary process in L 2 with mean and a its autocovari- 
ance function. We assume that the process (A"„) ne z is a long-memory process i.e.: 

00 

\<r(k)\ = 00. 

k=—oo 

The process {X n ) n£ i admits an infinite moving average representation as follows: 



j=0 



where (e n ) n ez is a white-noise series consisting of uncorrelated random variables, each with mean 
zero and variance <r 2 and (6j)j'gn are square-summable. We shall further assume that (X n ) ne % 
admits an infinite autoregressive representation: 



00 



e n — ajX n -j, (2) 

where the (aj)j^ are absolutely summable. We assume also that (aj)j^ and (bj)j^, occurring 
respectively in j2|) and (pQ), satisfy the following conditions for all 5 > 0: 

h-| < dj- d - 1+s (3) 
N < C 2 j d - 1+S . (4) 

where C\ and C2 are constants and d is a parameter verifying d e]0, l/2[. For example, a FARIMA 
process (X n ) ne % is the stationary solution to the difference equations: 

cP(B)(l-B) d X n = 0(B)e n 

where (e n )nez is a white noise series, B is the backward shift operator and cj) and are polynomials 
with no zeroes on the unit disk. Its coefficients verify 



+00 

Ad-l 
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and thus ([3]) and J4| hold. When = = 1, the process (X n ) n( z% is called fractionally integrated 
noise and denoted F(d). More generally, (aj)j^ and (6j)jeN verify conditions (|3j) and J4} if: 

+ OO 

+00 

where L and L' are slowly varying functions. A positive function L is a slowly varying function 
in the sense of |Zygmund, 1968| if, for any 5 > 0, x 1— > x~ 5 L(x) is decreasing and x 1— > x s L(x) is 
increasing. 

The condition ([4j implies that the autocovariance function a of the process {X n ) Tl& i verifies: 

V5 > 0, 3C 3 G R, |a(j)| < C 3 j 2d ~ 1+S . (5) 

Notice that it suffices to prove ((5|) for 5 near in order to verify (|5j) for 5 > arbitrarily chosen. So 
we prove © for 5 < 

+00 

3=0 
+00 

W(k)\ < ^2 \bjbj +k \ + \b b k \ 
3=1 

+00 

< C 2 2 ^/- 1+5 (fc + J r- 1+5 + |6 6 fc ! 

3=1 

'-co 

5fc 



r+00 

< Cf / /- 1+5 (A: + j) d - 1+5 dj + |6o6, 

JO 

/■+00 

< clk 2d - 1+2S j d - 1+s (l + j) d - 1+5 dj + C 2 k d - 1+s 

Jo 



< c 3 k 2d - 1+25 

More accurately, |Inoue, 1997| has proved than if: 

b^L{j)j d - x 

then 

^(i)-J 2 ^ 1 [L(j)] 2 /?(l-2d,d) 

where L is a slowly varying function and (3 is the beta function. The converse is not true, we 
must have more assumptions about the series (6j)jeN m order to get an asymptotic equivalent for 
(cr(j'))jeN (see |Inoue, 2000] ) . 

2 Wiener-Kolmogorov Next Step Prediction Theory 
2.1 Wiener-Kolmogorov Predictor 

The aim of this part is to compute the best linear one-step predictor (with minimum mean-square 
distance from the true random variable) knowing all the past {Xk+i-j,j ^ 1}. Our predictor is 
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therefore an infinite linear combination of the infinite past: 

oo 
3=0 

where (A(j))jgN are chosen to ensure that the mean squared prediction error: 



E[(X k {l)-X k 



+1; j 

is as small as possible. Following |Whittle, 1963| , and in view of the moving average representation 
of (X n ) ne .x, we may rewrite our predictor X k (l) as: 

oo 

x k (i)=J2HjH- j - 

3=0 

where (0(j))j e N depends only on (A(j)) je N and (e n ) ng ^ and (aj)j^ are defined in j2]). From the 
infinite moving average representation of (X n ) n ^z given below in ([T]), we can rewrite the mean- 
squared prediction error as: 



E[(X k (l)-X, 



k+lj 



E 



E 



J=0 j=0 



-3 



£fc+i - ~ b (i + x )) £k - 



3=0 



3=0 



since the random variables (e n )nez are uncorrelated with variance a\. The smallest mean-squared 
prediction error is obtained when setting <f>(j) = bj + \ for j > 0. 

The smallest prediction error of (X n ) n£ z is of within the class of linear predictors. Furthermore, 

if 



+oo 



CLjZ 3 , 



m = E 

3=0 

denotes the characteristic polynomial of the (a(j))j £ z and 



3=0 



■ Z 3 



that of the (a(j))j & z, then in view of the identity, A(z) = B(z) , \z\ < 1, we may write: 



X k (l) = — ajXfc + i_ 



(6) 
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2.2 Mean Squared Prediction Error when the Predictor is Truncated 



In practice, we only know a finite subset of the past, the one which we have observed. So the 
predictor should only depend on the observations. Assume that we only know the set {Xi, . . . ,X k } 
and that we replace the unknown values by 0, then we have the following new predictor: 

k 

X' k (l) = -J2^k + i-r (7) 

i=i 

It is equivalent to say that we have truncated the infinite series J6)) to k terms. The following 
proposition provides us the asymptotic properties of the mean squared prediction error as a function 
of k. 

Proposition 2.2.1. Let {X n ) n ^ be a linear stationary process defined by (TJ), ([2]) and verifying 
conditions (|3j) and (j4j). We can approximate the mean- squared prediction error of X 1(1) by: 

V5 > 0, E{[X k+1 - A^(l)] 2 ) = al + 0(k- 1+s ). 

Furthermore, this rate of convergence 0(/c _1 ) is optimal since for fractionally integrated noise, we 
have the following asymptotic equivalent: 

E([X k+1 - Xl(l)] 2 ) = a 2 £ + Ck~ l + o {k- 1 ) . 

Note that the prediction error is the sum of <r|, the error of Wiener- Kolmogorov model and the 
error due to the truncation to k terms which is bounded by 0(k~ 1+s ) for all 5 > 0. 

Proof. 

X k+1 -X' k (l) = X k+l -X k (l)+X k (l)-X' k {\) 

+oo +oo 

= X k+ i — ^2 °j+i £ k-j — ^2 a jXk+i-j 
j=o j=k+l 

= — djXk+i-j. (8) 

j=k+l 

The two parts of the sum ([8j) are orthogonal for the inner product associated with the mean square 
norm. Consequently: 

oo oo 

E{[X k+1 -X' k (l)] 2 )=a 2 e + J2 E Oi^(i-J'). 

j=k+l l=k+l 



For the second term of the sum we have: 

+oo +oo 



J2 E wQ-j) 

j=k+l l=k+l 



+00 



+oo 



j=k+l l=j+l j=k+l 



< 2 J2 KIK+ilk(i)l+ E a i a (°) 

j=k+l j=k+l 
+oo +oo 

+ 2 E i a ji E \ a i\\ ff (! - fi\ 

j=k+l l=j+2 
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from the triangle inequality, it follows that: 



-oo +00 



Y Y a j a i a ( l - 3) 

j=k+l l=k+l 



+00 



< cic\2 y, 3~ d - 1+s {j + ir d - 1+s + Y (r*- 1 - 

j=k+l j=k+l 



+ 2CfC 3 Y 3' 
j=k+i 



Y r d ~ 1+s \i-3\ 

1=3+2 



(9) 



(10) 



for all 5 > from inequalities j3]) and ((H). Assume now that 5 < 1/2 — d. For the terms (|9j), 
since j 1— > j~ d ~ 1+S (j + l)~ d ~ 1+<5 } s a positive and decreasing function on M + , we have the following 
approximations: 



2cfc 3 Y r d - 1+s U + i) 



-d-l+S 



j=k+l 



2CfC 3 [ 
Jk 



-d-l+S 



U + 1) 



-d-l+S 



2C1C3 2d-l+2S 



1 + 2d -26 



Since the function j 
that: 



-d-l+S 



) is also positive and decreasing, we can establish in a similar way 



+00 



i=fe+l 



-d-l+<5 



+00 



C1C3 _^-2d-l+2<5 



1 + 2d - 25 

For the infinite double series (fTO]) . we will similarly compare the series with an integral. In the 
next Lemma, we establish the necessary result for this comparison: 

Lemma 2.2.1. Let g the function (I , j) 1— > j- d ~ 1+& [-d-l+S |^_j|2d-i+5_ i, e i m a ndn be two positive 



integers. We assume that 5 < 1 — 2d and m > $ + d d-i f or a ^ ^ e 
square [n, n + 1] x [m, m + 1] . If n > m + 1 then 



0. 



5+2d-l 



We tw7/ ca// m the 



/ g(Z,j)dj'dZ>5(n + l,m). 



Assume now that 5 < 1 — 2d without loss of generality. Thanks to the previous Lemma and the 
asymptotic equivalents of ([9|), there exists K G N such that if k > K: 



-00 +00 



Y Y a 3 a l a ( l -3l 

j=k+l l=k+l 



< c 



-d-l+S 



k+1 



I 



-d-l+S 



{I ~ 3) 



2d-l+5 



dl 



dj + Ik 



-2d- 1+25 



By using the substitution jl' = I in the integral over I we obtain: 



1-00 +00 



Y Y ajaia(l-j] 

j=k+l l=k+l 



J 



-2+35 



fe+1 



i-d-l+S 



(l-l 



,2d-l+S 



dldj + O (V 



2d-l 
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Since if 5 < (1 -d)/2 



-d-l+5/7 -,\2(i-l+<5 



r fl - w (i-i 



it follows: 



-oo +00 



Y Y a 3 a M i -j] 

j=k+l l=k+l 



dl < +OO, 



< o (/fc- 1+3<5 j + () (/,' 
- ou-- J - : »"' 



-2cZ-l 



(11) 



If 5 > 0, S < 1 - 2d and <5 < (1 - d)/2, we have: 



-00 +00 



Y Y a i a i a ( l - j) 

j=k+l l=k+l 



Ok 



-1+3(5 



Notice that if the equality is true under the assumptions 5 > 0, 6 < 1 — 2d and 5 < (1 — d)/2, it is 
also true for any 5 > 0. Therefore we have proven the first part of the theorem. 
We prove now that there exists long-memory processes whose prediction error attains the rate of 
convergence k . Assume now that (X n ) ne g is fractionally integrated noise F(cZ), which is the 
stationary solution of the difference equation: 



X n = (1 — B) e n 



(12) 



with B the usual backward shift operator, (e n )nez is a white-noise series and d € ]0, l/2[ (see for 
example |Brockwell and Davis, 1991] ). We can compute the coefficients and obtain that: 



Vj > 0, a 
then we have: 



r(i - d) 



(-ivr(i - 2d) 

-rWTjrH 311 ^- ' ^) = ro--d+i)r(i- j -d) 



0-; 



and 



J 



Vj > 0, < and Vj > 0, a(j) > 

j M_1 r(i - 2d) 



and cr(j) 



when j — > 00. 



J r(-d) w r(d)r(i-d) 

In this particular case, we can estimate the prediction error more precisely: 

o +00 +00 +00 +00 



fc+i fe+i 



X) Y, a j a i a (! - j) 
fc+i fe+i 



Mil - i)l + 

fe+i j+i fc+i 

r(i - 2d) 

T(-d) 2 T{d) 
T(l - 2d)T(2d) 



T(-d) 2 T(d)T(l + d) 
The asymptotic bound 0(k^ 1 ) is therefore as small as possible 



(13) 



-r 



□ 
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In the specific case of fractionally integrated noise, we may write the prediction error as: 
E([X k+1 - X' k {\)] 2 ) = a 2 £ + C(d)k- 1 + o (AT 1 ) 
and we can express C(d) as a function of d: 

_ r(l - 2d)T{2d) 

^ ' ~ r(-d) 2 r(d)r(i + d) ' (14) 

It is easy to prove that C(d) — > +00 as d — > 1/2 and we may write the following asymptotic 
equivalent as d — > 1/2: 

a ^ ~ (1 - 2d)r(-i/2) 2 r(i/2)r(3/2) ' ^ 

As d — s- 0, C(d) — > and we have the following equivalent as d — > 0: 

C(d) ~ d 2 . 



Figure 2.1: Behaviour of constant C(d), d € [0, l/2[, defined in (fT4|l 




d 



As the figure I2TT1 suggests and the asymptotic equivalent given in (fT5l) proves, the mean-squared 
error tends to +00 as d — > 1/2. By contrast, the constant C(d) takes small values for d in a large 
interval of [0, l/2[. Although the rate of convergence has a constant order A;" 1 , the forecast error 
is bigger when d — > 1/2. This result is not surprising since the correlation between the random 
variable, which we want to predict, and the random variables, which we take equal to 0, increases 
when d — ► 1/2. 
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Truncating to k terms the series which defines the Wiener- Kolmogorov predictor amounts to 
using an XR(k) model for predicting. Therefore in the following section we look for the AR(fc) 
which minimizes the forecast error. 

3 The Autoregressive Models Fitting Approach 

In this section we develop a generalisation of the "autoregressive model fitting" approach developed 
by |Ray, 1993| in the case of fractionally integrated noise F(d) (defined in (fl~2l) ). We study the 
asymptotic properties of the forecast mean-squared error when we fit a misspecified AR(fc) model 
to the long-memory time series (X n ) nt =%. 

3.1 Rationale 

Let $ a k th degree polynomial defined by: 

$(z) = 1 - a ljk z - ... - a ktk z k . 
We assume that <1> has no zeroes on the unit disk. We define the process (%) n ez by: 

Vn e Z, Vn = $(B)X n 

where B is the backward shift operator. Note that (r] n ) ne z is not a white noise series because (X n ) ne z 
is a long-memory process and hence does not belong to the class of autoregressive processes. Since 
$ has no root on the unit disk, {X n ) n& i admits a moving-average representation as the fitted AR(fc) 
model in terms of (r/ n ) ne z: 

oo 

X n = ^c^r/n-j. 

3=0 

If {X n ) n £i was an AR(fc) associated with the polynomial $, the best next step linear predictor 
would be: 

oo 
3=1 

ai,kX n + . . . + a fc ,fcA n+ i_ fc si n ^ k. 

Here (X n ) n£ z is a long-memory process which verifies the assumptions of Section [TJ Our goal is to 
derive a closed formula for the polynomial $ which minimizes the forecast error and to estimate this 
error. 

3.2 Mean-Squared Error 

There exists two approaches in order to define the coefficients of the k th degree polynomial $: the 
spectral approach and the time approach. 

In the time approach, we choose to define the predictor as the projection mapping on to the 
closed span of the subset {X n , . . . , X n+ i_ k } of the Hilbert space L 2 (f2,J r , P) with inner product 



A„(l) = 
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< X, Y >= E(YY). Consequently the coefficients of <3? verify the equations, which are called the 
k th order Yule- Walker equations: 



with 



i=l 

The mean-squared prediction error is: 

E[(^(l) - X n+1 ) 2 ] = c(0) 2 E( V 2 n+1 ) = E( V 2 n+1 ). 
We may write the moving average representation of (?7n)n€N in terms of (e n )neN: 

oo min(j,p) 

oo 
3=0 

min(j,p) 

VjGN, t(j) = £ W-fc). 

fc=0 



(16) 



Finally we obtain: 



E[(X n (l) - X n+1 ) 2 ] =Y,t(j) 2 * 2 £ . 

j=0 



In the spectral approach, minimizing the prediction error is equivalent to minimizing a contrast 
between two spectral densities: 

where / is the spectral density of X n and g(., $>) is the spectral density of the AR(p) process defined 
by the polynomial $ (see for example |Yajima, 1993| ),so: 



* /(A) 



, 9(A ,^^/_JEW-«1>(-") 

|J»-^| 2 dA 

-7T _-_n 



dA 



3=0 



In both approaches we need to minimize X^o^O)- 
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3.3 Rate of Convergence of the Error by AR(A;) Model Fitting 



In the next theorem we derive an asymptotic expression for the prediction error by fitting autore- 
gressive models to the series: 

Theorem 3.3.1. Assume that (X n ) n< zi is a long-memory process which verifies the assumptions of 
SectionUl IfO<d<±: 

E[(X k (l) - X k+l f] - a 2 , = 0(k- r ) 

Proof. Since fitting an XR(k) model minimizes the forecast error using k observations, the error by 
using truncation is bigger. Since the truncation method involves an error bounded by O (k _1 ) , we 
obtain: 

E[(X k (l) - X k+l f] - al = Oik- 1 ). 

Consequently we only need to prove that this rate of convergence is attained . This is the case for 
the fractionally integrated processes defined in (fT2j) . We want the error made when fitting an AR(fe) 
model in terms of the Wiener-Kolmogorov truncation error. Note first that the variance of the white 
noise series is equal to: 



(J, 



/(A) 



3=0 



dA. 



Therefore in the case of a fractionally integrated process F(d) we need only show that: 

2 



/(A) 



3=0 



2vr g(\,$ k ) 



dX^Cik- 1 ). 



/(A) 



3=0 



2vr g(\,® h 



-dA 







2 


k 




/-> ( 








dA 




3=0 




3=0 





we set a,j k = if j > k. 



^ ifljO-i - a jtk ai,k) o-{j - I) 

3=0 1=0 



^ ^ (ajai - a jtk ai tk ) <?{j - I) 

j=0 1=0 

+00 +00 + OO +00 

^ ^{ajai - a jyk ai)a(j - I) + £ ^(aj^ai - a jjk ai tk )a(j - I) 
j=o 1=0 j=o 1=0 

+oo +oo k +oo 

£(o,- - a j:k ) £ a l a ( l ~ 3) + £ a i> k £( a ' ~ a i,k) a (j - 

j=0 1=0 3=0 1=0 



(17) 
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We first study the first term of the sum (fl~8j) . For any j > , we have Ylt^o a i a (l ~ J) = ; 

oo 

En = ^2 aiX n _i 

j=0 

oo 

X n -jS n = a lX n -lX n -j 

1=0 

oo 

E (X n _j£ n ) = ^2 aia(l - j) 

1=0 

(oo \ oo 

^ h£ n -j~ie n = ^2 ai<j(l - j) 
1=0 J 1=0 

and we conclude that Yli=^o a i a (l ~ j) = because (e n )nez is an uncorrelated white noise. We can 
thus rewrite the first term of (fl~8l) like: 



+oo +oo +oo 

^2( a j - a hk) ^2,aia(l -j) = (a - a 0:k )^2aia(l) 

j=0 1=0 1=0 

= 

since ao = ao,fc = 1 according to definition. Next we study the second term of the sum (fl~8l) : 

k +oo 

/, aj,k y^( a i - %k)v(j ~ I)- 

3=0 1=0 

And we obtain that: 

k +oo fc fc 

j=0 1=0 j=l 1=1 

k +oo 

+ ^j(aj,k ~ aj) ^2 a i a (j ~ ( 19 ) 
3=1 l=k+l 
k k 

+ Y, a j^ ai -%h) a U -i) (20) 

3=0 1=1 

k +oo 

+^2 a j ai<T ^ ~ o 

j=0 l=k+l 

Similarly we rewrite the term (fT9j) using the Yule- Walker equations: 

k +oo k k 

j=l i=A+l 3=1 i=0 
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We then remark that this is equal to j20j). Hence it follows that: 



k +00 k k 

Y a ^ k Y^ ai ~ ai,k)cr(j -I) = }Xa>j,k ~ aj) / ~ a l,k)&(j ~ I) 
j=o 1=0 j=i 1=1 

k +00 

+2 ^2{a j)k - aj) ^2 a l a U ~ 

j=l l=k+l 
k +00 

+ I>J °J ff O'-0 (21) 

j=0 Z=fc+1 

On a similar way we can rewrite the third term of the sum (I2ip using Fubini Theorem: 

k +00 +00 +00 

j=o /=fc+i i=fe+i «=fe+i 

This third term is therefore equal to the forecast error in the method of prediction by truncation. 

In order to compare the prediction error by truncating the Wiener-Kolmogorov predictor and by 
fitting an autoregressive model to a fractionally integrated process F(d), we need the sign of all the 
components of the sum (|2~Tj) . For a fractionally integrated noise, we know the explicit formula for 
aj and a(j): 

V j>0, Qj = - r(j ~5 - < and Vj > 0, a(j) = — -tlffiLz^) -a 2 £ > 0. 

J r(j + l)T{-d) ~ yJ> T(j -d + l)r(l - j - d) £ 

In order to get the sign of aj ik — Oj we use the explicit formula given in |Brockwell and Davis, 19881 
and we easily obtain that aj jk — aj is negative for all j € [1, kj. 

T(j-d) T (k + l)T(j -d)T(k-d-j + l) 



r(j + i)r(-d) r(k - j + i)r(j + i)v(-d)T(k -d+i) 

T(k + l)T(k - d - j + 1) 



-1 + 



T(k-j + l)T(k-d+l 
k...(k-j + l) 



(k-d)...(k-d-j + l) 



1 



> 



since Vj G N* Oj < 0. To give an asymptotic equivalent for the prediction error, we use the sum 
given in (|2~T]) . We have the sign of the three terms: the first is negative, the second is positive and the 
last is negative. Moreover the third is equal to the forecast error by truncation and we have proved 
that this asymptotic equivalent has order 0(A; _1 ). The prediction error by fitting an autoregressive 
model converges faster to than the error by truncation only if the second term is equivalent to 
C/c -1 , with C constant. Consequently, we search for a bound for aj — aj >k given the explicit formula 
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for these coefficients (see for example [Brockwell and Davis, 19 88 1): 

T(j-d) T (k + l)T(j -d)T(k-d-j + l) 



cij aj t k 



T(j + l)T(-d) T(k - j + l)T(j + l)T(-d)T(k -d+1) 
1 | T(k + l)T(k-d-j + l) \ 



T(k-j + l)T(k-d+l) J 
k...(k-j + l) 



(k-d)...(k-d-j + l) 

n(rnli)- 1 ) 

m=0 V 1 k / / 

n K 



k 



_ d+l 
\m=0 \ ' k 



1 • 



Then we use the following inequality: 

ViGR, 1 + x < exp(x) 



which gives us: 



j-i d 

k 



aj - a j>k < -oj ( exp ( ^ ~ 



_ d+i 

rn=0 x k , 



< -aj 



\ m=0 / 



< — aj exp 



According to the previous inequality, we have: 

k +oo k— 1 +oo 

^( a j ~ a j,k) ~ a i a ~ l ) = J2( a i ~ a i> k ") ^2 ~ aia (i - l ) 



j=l l=k+l 



+oo 

+(a k - a kik ) ^ -aia(k - I) 

l=k+l 

k-l / j-l . \ +oo 

< ^-a.exp ld^ ] —j^) Y- ai a(j-l) 

j=l \ m=Q / l=k+l 

/ fe-1 \ +oo 

+(-a fc )exp k _ d _ ) ~ a M k ~ l ) 

\ m=0 / l=k+l 

k — 1 / „j ^ \ + OO 

< ^-a.exp^y fc _ d _ m dmj £ -a<a(j - J) 

j=l K J[i 7 l=k+l 

+oo 

+(-a fe )/ci d ^ -aia(k-l) 



l=k+l 



14 



As the function x i— > fc _^_ ;E is increasing, we use the Integral Test Theorem. The inequality on the 
second term follows from: 

fe-1 

^ k - d-m 

m=0 

for k large enough. Therefore there exists K such that for all k> K: 
k +00 fc— 1 / ( k d W 

^(a,- - a j;k ) ^2 - a i a (j- 1 ) < ^2 ~ a i exp ( dln [ u-d - ' ) ) ^ ~ ai<7 ti- 1 ) 

j=l l=k+l j=l ^ ^ - 1 ) ) i =k+ i 

+00 

+ (-a fc )/cl d -°^(°) 
Z=fc+1 

fc— 1 +00 

< c(k-d) d j2r d -\k-d-j)- d i~ d 'Hi-j) 
3=1 i=k+i 

d-liAdi-d 



2d-l 



+Ck- a - l k2 a k 



1 f+00 



< j x k r dr - 1 0--j)- d I r^Hi-i^mj 



+ck-^- 1 



-±d-l 



< C'{k-d)- 2+d + Ck-^ 

and so the positive term has a smaller asymptotic order than the forecast error made by truncating. 
Therefore we have proved that in the particular case of F(d) processes, the two prediction errors are 
equivalent to Ck~ x with C constant. □ 

The two approaches to next-step prediction, by truncation to k terms or by fitting an autoregres- 
sive model AR(fc) have consequently a prediction error with the same rate of convergence fc -1 . So 
it is interesting to study how the second approach improves the prediction. The following quotient: 

n N Ej=i( a i,fc - a j) ELi( a « - ai,k)cr{j -I) + 2 Ej=i( a i,fc - aj) ES+1 a l a (j ~ , nnS 
r(k) : = — t : (22) 

is the ratio of the difference between the two prediction errors and the prediction error by truncating 
in the particular case of a fractionally integrated noise F(d). The figure GO shows that the prediction 
by truncation incurs a larger performance loss when d — > 1/2. The improvement reaches 50 per cent 
when d > 0.3 and k > 20. 

After obtaining asymptotic equivalent for next step predictor, we will generalize the two methods 
of h-step prediction and aim to obtain their asymptotic behaviour as k — > +00 but also as h — > +00. 

4 The h-Step Predictors 

Since we assume that the process (X n ) n( zz has an autoregressive representation §2) and moving 
average representation ([]]), the linear least-squares predictor, X k +h, of X k +h based on the infinite 
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Figure 3.1: Ratio r(Jfe), d e]0, l/2[ denned in (|22D 




k=75 
k=20 



0.0 



0.1 



0.2 



0.3 



0.4 



0.5 



past (Xj,j < k) is given by: 



X k (h) = -^2djX k (h - j) = bje k+h - 



3=1 



(see for example Theorem 5.5.1 of |Brockwell and Davis, 199T| ). The corresponding mean squared 
error of prediction is: 

2" 



E 



X k (h)-X, 



k+h 



h-l 

T,>>1 

3=0 



As the prediction step h tends to infinity, The mean-squared prediction error converges to X^3)> 
which is the the variance of the process (X n ) n€ %. But if the mean-squared prediction error is equal 
to cr(0), we have no more interest in the prediction method since its error is equal to the error of 
predicting the future by 0. Remark that the mean-squared error increases more slowly to cr(0) in 
the long-memory case than in the short-memory case since the sequence bj decays more slowly to 0. 
More precisely in the case of a long-memory process, if we assume that: 



~ J 

+oo 



d - l L{j) 



where L is a slowly varying function, we can express the asymptotic behaviour of the prediction 
error. As j i— * L 2 {j) is also a slowly varying function according to the definition of [Zy gmund, 1 968 1, 



16 



b 2 =j 2d-2 L 2 { 

and we may write: 



tij = J M 2 L 2 (j) is ultimately decreasing. The rest of the series and the integral are then equivalent 



a(0) 



X k (h) — Xk+h 



j=h 

+ 00 

£i M ~ 2 £ 2 (i) 
j=h 

-co 

■2d-2 T 2, 



h 



According to Proposition 1.5.10 of |Bingham et al., 19 87): 



cr(0) - E 



X k (h) - X k+h 



1 



J 2d ~ 2 L\j)dj 



h 2d - l L 2 {h) 



h^+oo 1 — 2d 



(23) 



In the case of a long-memory process with parameter d which verifies bj ~ j d 1 L(j), the convergence 
of the mean-squared error to cr(0) is slow as h tends to infinity. On the contrary, for a moving average 



process of order q, the sequence cr(0) — E 



X k (h) — X k+h 



is constant and equal to as soon as 



h > q. More generally, we can study the case of an ARMA process, which canonical representation 
is given by: 

*(X t ) = e(e t ) 

where $ and are two coprime polynomials with coefficients of degree are equal to 1 and et is a 
white noise. $ has no root in the unit disk \z\ < 1 and has no root in the open disk \z\ < 1. bj is 
bounded by: 

\bj\ < Cj m - l p- 3 

where p is the smallest absolute value of the roots of $ and m the multiplicity of the corresponding 
root (see for example |Brockwell and Davis, 1991] p92). Thus the mean-squared prediction error is 
bounded by: 



cr(0) — E 



X k (h)-X, 



k+h 



; E^ 2 

j=h 



< °eC 2 E 



2m-2 -2j 



j=h 
+oo 



< ^j2j 

j=h 



< °lc 2 



2m-2 



exp(-2jlog(p)) 



■2m-2 



exp(-2jlog(p))dj 
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By using the substitution t = 21og(p)j 



o-(O) - E 



. 2 

X k (h) - X k+h 



r+oo 

< a 2 £ C 2 (2 \og{p)) l - 2m / t 2m ~ 2 exp (t) dt 

J2\og(p)h 

< a 2 C 2 (2 log(p)) 1 - 2m T(2m - 1, 2 log(p)h) 



where .) is the incomplete Gamma function defined in equation 6.5.3 of |Abramowitz and Stegun, 19 84 1. 
We know an equivalent of this function: 

T(2m -1,2 log(p)h) ~ (2 \og{p)h) 2m - 2 exp (2 log(p)h) 

ft— -*+oo 

We conclude that the rate of convergence is exponential. The mean-squared prediction error goes 
faster to er(0) when the predicting process is ARMA than when the process is a long-memory 
process. 

The /i-step prediction is then more interesting for the long-memory process than for the short- 
memory process, having observed the infinite past. We consider the truncating effect next. 

4.1 Truncated Wiener-Kolmogorov predictor 

In practice, we only observe a finite number of samples. We assume now that we only know k 
observations (X\, . . . ,X k ). We then define the /i-step truncated Wiener-Kolmogorov of order k as: 

h-l k 

X 'k( h ) =-J2 a j X 'k( h -j)-J2 a h-l+3 X k+l-j (24) 
j=l 1=1 



We now describe the asymptotic behaviour of the mean-squared error of the predictor (|24|) . First 
we write the difference between the predicting random variable and its predictor: 

h—l k +oo 

X 'k(h) — X k+h = - E a j X 'k(h ~ j) ~ E a h-l+jXk+l-j — £k+h + E O-jXk+h-j 
j=l 1=1 j=l 

h-l k 

= —£k+h + E o-j (^Xk+h-j - X' k (h — jyj + E a h-i+j (Xk+i-j - X k 



+1-3 ) 

3=1 3=1 



+ E a-h-i+jXk+i-j 
j=k+l 

h—l + OC 

—£k+h + E a 3 ( X k+h-j - X' k (h — j) \ + E a h-l+jXk+l-j 
3=1 j=k+l 



We will use the process of induction on h to show that 

h-l 



X' k {h)-X k+h = -E E (-l) card{{j > 3m) a n a n ...a Jh )e k+h ^ 

1=0 \ji+h+-+jh=l 



-oo 



+ E E (-l) c - d({ -^°' Z>1}) a J+il ^ ■ • • a ih ) X fe+1 _ 

j=k+l \ ii+i 2 +...+ih=h-l 



3- 
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For h = 2, we have for example 



+00 



X' k (2) — X k+2 — — (ao£fc+2 — ai£fc + i) + (— a\aj + a.,- + i)X fc+1 _j. 

j=fc+i 

Let A(z) and -B(z) denote -A (2:) = 1 + Yl~j=i a j z ^ an d ^(2) = 1 + Y^=\bjzK Since we have 
A(z) = B(z)^ 1 , we obtain the following conditions on the coefficients: 

bi = -a x 

h = — «2 + a\ 

h = —0,3 + 2aia 2 - a? 



So we obtain: 



+00 h— 1 
j=k+l m=0 



(25) 



z=o 



Since the process (e n )nez is uncorrelated and then the two terms of the sum ([25]) are orthogonal, 
we can rewrite the mean-squared error: 



h-l 



E 



x' k {h)-x k+h = J2 b h 



1=0 



+E 



+00 /h-l \ 

^] I a j+h-l-mb m J -Xfc+l-j 

j=fc+l \m=0 / 



(26) 



(27) 



The first part of the error (j26|) is due to the prediction method and the second (|27jl due to the 
truncating of the predictor. We now approximate the error term ([27j by using j3]) and ([4]). We 
obtain the following upper bound: 



V<5 > 0, 



h-l 



^ Qj+h—l—Trfiri 



m=0 



h-l 



m=0 



j+h—mUm 



— \ a j+h-l-mb m \ + [&0Oj+ft-l 

m=l 

,-h 

< C X C 2 (J + h-l-l)- d - l+8 l d - 1+s dl + d(j + h)- d - 1 

Jo 

r l , . \ -d-l+S 

< CiC 2 h- 1+2S J ll + l-lj l d - 1+5 dl + C 1 (j + h)- d - 1 

< cxhh- 1 *" r d - 1+s f + ~ d ~ 1+S i d - l+5 di + c\(j + h)- 

< c l c 2 h d+2S r d - 1+s f ' d - 1+,J ' 

i.d+2<5 



■<f-l 



f 1 l d - 1+5 dl + c l (j + hy 
Jo 



d-1 



(28) 
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This bound is in fact an asymptotic equivalent for the fractionally integrated process F(d) because, 
in that case, the sequences Oj and bj have a constant signs. Using Proposition 12.2.11 for the one-step 
prediction and we have: 

Proposition 4.1.1. Let (X n ) n ^z be a linear stationary process defined by JT]), ([2]) and possessing 
the features jHJ) and (|4|). We can approximate the mean-squared prediction error of X' k {l) by: 



h-l 



V5 > 0, E 



1=0 



X' k {h) - X k+h " = £ bfal + O (h? d + s k- 1+s 



(29) 



Having k observations, we search for the step h for which the variance of the predictor has for 
upper bound c(0). Then the prediction error have for asymptotic bound O (h 2d k~ 1 ). We want to 
choose h to have the prediction error negligible with respect to the information given by the linear 
least-squares predictor given the infinite past (see (|23l) ) and we obtain: 



h 2d k~ l 



o{h 



2d- In 



and then h = o(k). With the truncated Wiener-Kolmogorov predictor, it is interesting to compute 
the h-step predictor if we have k observations h = o(k). 

4.2 The k-th Order Linear Least-Squares Predictor 

For next step predictor, when we fitted an autoregressive process, we search the linear least-squares 
predictor knowing the finite past (X±, . . . , X k ) and the predictor is then the projection of the random 
variable onto the past. Let X^(h) denote the projection of Xk+h on to the span of (Xi,... 
Xk(h) verifies the recurrence relationship 

k 

Xk(h) = ~y~] a j:k X k (h - j) 
j=i 

where X k (h — j) is the direct linear least-squares predictor of X k +h~j based on the finite past 
(Xi, . . . ,Xk). By induction, we obtain the predictor as a function of (X±, . . . ,Xk)- For next step 
prediction by fitting an autoregressive process, the best linear least-squares predictor knowing the 
finite past is a projection of the random variable Xk+\ onto the past. 

k 

Xk(h) = — Cj t kXk+i-j. 

3=1 

Since Xk(h) is the projection of Xk+h onto (X\, . . . , Xk) in L 2 , the vector (cj t k)i<j<k minimizes the 
mean-squared error: 



E 



X k {h) — Xk+h 



/(A) 



exp(iA(/i - 1)) +y~] Cj ;k exp(-iAj) 



clA 



The vector (cj t k)i<j<k is a solution of the equation: 



X k (h) - Xk+h 
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where V c is the gradient. The vector {cj : k)i<j<k is then equal to: 

( c j,k)l<j<k = — S fc l {<Jh-l+j)l<j<k- 



(30) 



The corresponding mean squared error of prediction is given by: 



E X k {h)-X k+h 




/(A) exp(iA(/i 



!)) + X] c J> fc ex P( _iA -?') dA 



fc k 



cr(0) + 2 (cj^Ji^^fcftT/j-i+j)!^^ + *(c : , i fc)i<j<fcSfc(cj > fc)i<j<fc 
cr(0) - t (<T/ l _i + j)i<jf<fcS^ 1 (o-/ l _i + j)i<j<fe 



The matrix S fc is symmetric positive definite and the prediction error of this method is always 
lower than <r(0). 

As Xk(h) is the projection of X k +h onto (X%, . . . , the mean-squared prediction error is also 
lower than the prediction error of the truncated Wiener-Kolmogorov predictor (see figure 14,11) . The 
mean-squared error of prediction due to the projection onto the span of (Xi, . . . , X k ) tends at least 
as fast to zero as the mean-squared due to truncation of the least-squares predictor. For one-step 
predictor, we have shown that the two methods can have the same rate of convergence. 
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Figure 4.1: Mean-squared error ofX k (h) (MMSE), X' k (h) (TPMSE) and X k (h) (LLSPE) for d = 0.4 
and k = 80 
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